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EXACT PROPERTIES OF EFRON'S BIASED COIN 
RANDOMIZATION PROCEDURE 

By Tigran Markaryan and William F. Rosenberger 1 

George Mason University 

Efron [Biometrika 58 (1971) 403-417] developed a restricted ran- 
domization procedure to promote balance between two treatment 
groups in a sequential clinical trial. He called this the biased coin 
design. He also introduced the concept of accidental bias, and inves- 
tigated properties of the procedure with respect to both accidental 
and selection bias, balance, and randomization-based inference us- 
ing the steady-state properties of the induced Markov chain. In this 
paper we revisit this procedure, and derive closed-form expressions 
for the exact properties of the measures derived asymptotically in 
Efron's paper. In particular, we derive the exact distribution of the 
treatment imbalance and the variance-covariance matrix of the treat- 
ment assignments. These results have application in the design and 
analysis of clinical trials, by providing exact formulas to determine 
the role of the coin's bias probability in the context of selection and 
accidental bias, balancing properties and randomization-based infer- 
ence. 

1. Introduction. Efron (1971) introduced his famous biased coin design 
as a method that "...tends to balance the experiment, but at the same 
time is not over vulnerable to various common forms of experimental bias." 
The primary application is in sequential clinical trials where balance in the 
numbers randomly assigned to two treatment groups is sometimes desirable 
for power considerations. In such cases, it is also desirable to maintain near- 
balance at intermediate points in the trial as heterogeneity or time trends 
in patient characteristics may lead to less comparable treatment arms. Ran- 
domization protects from imbalances in unknown covariates related to out- 
comes (which Efron referred to as accidental bias, introduced for the first 
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time in the 1971 paper), selection bias and provides a basis for inference. 
Efron explored the balancing properties of the biased coin design, as well 
as its susceptibility to selection and accidental bias, and discussed the im- 
plications for randomization-based inference. All of these results were based 
on studying the steady-state properties of the Markov chain induced by the 
imbalance process of biased coin randomization. 

Let T n = (Ti, . . . , T n )' be a randomization sequence, where Tj = 1 if treat- 
ment A is assigned, and Tj = — 1 if treatment B is assigned, i = 1, . . . , n. After 
j assignments, let Dj be the difference in the number of patients assigned 
to treatments A and B; that is, Dj = Y2i=i ^i- The biased coin design with 
bias p G [0.5, 1], denoted BCD(p), is defined by 

{1/2, when = 0, 

p, when Dj-x < 0, j = 1,2,3,..., 

1 — p, when Dj-i > 0. 

Note that p = 0.5 results in complete randomization and p = 1 results in a 
permuted block design with block size of 2, in which case every alternate 
assignment is deterministic. Efron notes that the {|D n ,|}^ ( L 1 process forms 
a Markov chain of period 2 with states 0, 1, 2, . . . and a reflecting barrier at 
the origin. He then proves that the \D n \ process has stationary probabilities 
■Kj, given by 



(1.1) 7Tj 



r 2 - 1 
r — 1 



when j > 1, 
when j = 0, 



2r ' 

where q = 1 —p, r =p/q > 1. Efron uses the formulas obtained for stationary 
probabilities to write the form of the limiting probabilities of perfect balance 
(n is even) and imbalance of 1 (n is odd) as 

lim P(|D 2m |=0) = 2vr = ^, 

n— >oo r 

lim PflJWll = 1) = 2tt! = T -^-. 

n— >-oo r 

Most research on the theory of randomization in recent years has focused 
on generalizations of Efron's procedure [see, e.g., Wei (1978), Soares and Wu 
(1982), Eisele (1994), Chen (1999), Baldi Antognini and Giovagnoli (2004) 
and Hu and Zhang (2004)] rather than Efron's procedure itself. In particular, 
Baldi Antognini and Giovagnoli's (2004) "adjustable biased coin design" is 
stochastically more balanced, and therefore uniformly more powerful, than 
the other procedures [Baldi Antognini (2008)]. 

The remainder of Efron's article is devoted to selection bias, as defined 
by Blackwell and Hodges (1957), accidental bias and randomization as a 
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basis for inference. Efron notes that the best guessing strategy against the 
BCD{p) is to always guess the group that has occurred least often up to 
that point. The probability of correctly guessing at the jth step is 

(1.2) ±P(D^ 1 =0)+pP(\D^ 1 \>0), 

which asymptotically approaches 1/2 + (r — l)/4r and therefore has asymp- 
totic excess selection bias of 



Accidental bias refers to the squared bias of the treatment effect in a 
linear regression when an unknown covariate z is left out of the model. 
Efron derives this bias as 

E(z'T n ) 2 = z'£ Tn z, 

where Sx„ = Var(T n ). He suggests a minimax approach by noting that 

(1.4) z'St„z < maximum eigenvalue of St„, 

where the inequality follows from the assumption that ||z|| = 1. Note that 
the minimum possible value for the maximum eigenvalue is 1 which cor- 
responds to complete randomization. Instead of directly examining Et„ 
(which he acknowledges is difficult), Efron looks at the much simpler pro- 
cess T\ , Tii T3, . . . , T n , assuming that it is stationary, and aims at finding 
the asymptotic covariance structure of the process. He then shows that the 
asymptotic maximum eigenvalue of the covariance vector (T/i +1 , . . . ,T/ 1+ n) 
as h— > 00, Atv, is increasing in N and has a finite limit. Based on numerical 
evidence, Efron conjectures that limjv->-oo -Vzv = 1 + (p — l) 2 - This was later 
proved by Steele (1980). However, Smith (1984) shows by counterexample 
that Efron's solution may be unsatisfactory when there are short-term de- 
pendencies in the data. 

In this paper, we derive exact properties of Efron's procedure. In partic- 
ular, in Section 2, we derive a closed- form expression for the distribution 
of D n and give the explicit form of St„. These formulas are remarkably 
compact for the complexity of the problems. We describe computational 
considerations in Section 3. In Section 4, we apply these results to deriving 
an explicit form for the excess selection bias, prove a result on the maximum 
eigenvalue of Sx„ and discuss randomization as a basis for inference. We 
also compare the exact results with Efron's for various n and p. In Section 
5, we draw conclusions. Finally, all proofs are given in Appendices A-C. 
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2. Exact distribution of D n and St„. We will assume the following 
conventions throughout the mathematical developments. 

1. For brevity, we adopt the convention to treat a combination Qj) as zero 
whenever any of the following conditions is true: x < 0, y < 0, x < y, y is 
not an integer. 

2. We treat summations as if the upper limit of the summation is smaller 
than the lower limit. 

3. We treat conditional probabilities, conditional on zero-probability events 
to be 0. 

The distribution of D n requires determination of the exact distribution 
of a denumerable homogeneous random walk. The following result is given 
as the first theorem: 

Theorem 2.1. Let n = 1,2,3, . . . ,0 < k < n and n and k have the same 
parity. Then, the distribution of D n of the BCD(p) is given by formulas 
(2.1) and (2.2). 

For k>0, 

-i (n-k)/2 /n + k , ,\ 

(2 .i) P{Dn ^ k) ^n g ^|( — + Y'-y 

For k = 0, 

n/2-l / n \ 

( 2 .2, nD^^T.^^y. 

Proof. See Appendix A. □ 

The compact form of these equations arises from patterns in polynomi- 
als of p and q that can be seen developing for small n as n increments. 
The proof is then by induction. Note that the distribution of N^in) follows 
immediately, since A^(n) = (D n + n)/2. 

Define t^ = P(T n = l\D n _\ = k). We now derive the covariance of (T n , T m ). 

Theorem 2.2. Let 1 < n < m. Then the joint distribution of (T n ,T m ) 
of the BCD(p), pE [1/2,1], is given by 

(2.3) P(T n = 1, T m = 1) = (J^2 ~ fk+i,o ^ 
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where 

dn,k — P(D n = k) and is given in (2.1) and (2.2) 

and 

Y, T ( l -±^- ) P il+lkl)/2 q {l - lkl)/2 , when k + 0; 

l=\k\ V 2 / 

1, when k = 0. 

(2.4) 

Proof. See Appendix B. □ 

The form of St„ follows immediately: 

Corollary 2.1. Let ^t„ be the covariance matrix ofT n of the BCD(p), 
p € [1/2, 1]. Then the (i,j)th entry of the matrix, o~ij, where 1 < i < j < n, 

x di_i ; fetfe - 1, w/ien z < j; 
k 1, ui/ien i = j; 

where f^ is defined in (2.4). 

3. Computational considerations. This section contains some observa- 
tions on the computation of P(D n = k) according to formulas (2.1) and 
(2.2) and the computation of P(T n = l,T m = 1) according to formulas (2.3) 
and (2.4). These formulas involve terms that are products of large factorials 
and powers of numbers that are between and 1. The key is to calculate 
these products in such order that the result does not get too large or too 
small too quickly. We focus on the computation of (2.1) here as the other 
formulas are similar. For n < 100, calculating the combination and multi- 
plying by powers of p and q directly works well. However, for larger values 
of n, precision may be lost if the intermediate products become too large or 
too small. 

Formula (2.1) involves (n — k)/2 + 1 terms, each of which is a product of 
powers of p, powers of q, positive integers and reciprocals of positive integers. 
The generic term of the right-hand side of (2.1) can be written as 



u 

f(u) _ M) _ 

Jk,0 ~ J k,0 ~ 

l=\k\ 



is given by 
(2.5) o~ij 



G 
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There are (n+k)/2 + 2l factors less than 1 and I greater than 1. Denote these 
two groups by {a s }^j^ 21 and {b s } l s=1 , respectively. Assume that the a s 
are indexed in decreasing order. The following simple algorithm ensures that 
the running products for calculating (3.1) do not become too large or too 
small early on in the calculation process. 

1. Fix a number M that the running product cannot exceed. Any number 
that is larger than 2n will work. 

2. Fix a number m that is close to the machine epsilon. When the running 
product gets close to m, the algorithm will know that further multiplica- 
tion by small numbers may result in loss of precision. 

3. Start multiplication using numbers in {b s } l s=1 until the running product 
exceeds M. 

4. Multiply the running product with numbers in {a s }^^ l k ^ 2+21 until the 
running product is less than M. 

5. Iterate through Steps 3 and 4 until numbers in {6 S }' S=1 are depleted. 

6. Continue multiplying the running product with the remaining numbers 
in {a s }^ 1 k ^ 2+21 , from the largest to the smallest. Two cases are possi- 
ble: the final product is larger than m, in which case the algorithm is 
completed; at some point, the running product becomes less than m, in 
which case one can save the result as a product of two or more small 
numbers. 

For example, we used the algorithm to compute Table 1, which gives the 
value of n at which the steady state probabilities are within certain percent- 
ages of the exact probability P{D n = k), for various k and p. The same idea 
can be used for calculating (2.4). 

Finally, for the computation of ^t„, the following proposition gives a 
property of the matrix that can facilitate computation. The proof follows 
from Corollary 2.1 and Lemma B.l and is omitted. 

Proposition 3.1. If Sx„ is partitioned into 2x2 submatrices, then all 
the off-diagonal submatrices are constant (i.e., have the same elements in 
both rows and columns). 

4. Applications to clinical trials. In this section we apply the results of 
Section 2 to the study of balancing properties of the randomization pro- 
cedure, selection and accidental biases and randomization as a basis for 
inference. Each of these is a consideration in the appropriate selection of 
a randomization procedure in clinical trials [see Rosenberger and Lachin 
(2002)]. 
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Table 1 



Values 


ofn 


starting at which, 


steady state probabilities are 


within 10%, 


o/o, l /o ana u.i/o 


ofP(D n 


= k),k = 


n i 9 a 

U, l,Z,d,4 








Errors within 




u 

K 


P 


10% 


5% 


1% 


0.1% 





0.6 


20 


34 


74 






0.7 


6 


8 


18 


34 




0.8 


2 


4 


8 


14 




0.9 


2 


2 


4 


6 


1 


0.6 


i n 
1!) 


33 


73 


1 A K 

140 




0.7 


5 


7 


17 


33 




0.8 


1 


3 


7 


13 




0.9 


1 


1 


3 


5 


2 


0.6 


14 


28 


68 


140 




0.7 


4 


4 


8 


22 




0.8 


4 


4 


<s 


14 




0.9 


2 


4 


6 


8 


25 


0.6 


183 


211 


279 


379 




0.7 


85 


93 


113 


141 




0.8 


53 


57 


65 


77 




0.9 


37 


39 


43 


49 


50 


0.6 


342 


380 


464 


>500 




0.7 


158 


168 


194 


226 




0.8 


100 


104 


116 


130 




0.9 


70 


72 


78 


86 



4.1. Balancing properties of the biased coin design. All finite balancing 
properties of the biased coin design can be investigated with the help of 
Theorem 2.1 which provides the means for exact calculations of the proba- 
bilities involving P(D n = k). In particular, the exact variance is given in the 
following proposition: 

Proposition 4.1. The exact variance of D n is given by 



n (n-k)/2 , n + k < 

fe=l 1=0 \ I 
n—k even 

(4.1) 



q 



k+l-l 



The variance of the imbalance of the biased coin design for different values 
of n and p is provided in Table 2. Also given in the table is the limiting 
variance based on the steady state distribution of the induced Markov chain. 
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The formulas for odd and even n are given in the following proposition which 
follows directly from (1.1). 

Proposition 4.2. Under the limiting distribution of the BCD{p), p £ 
[1/2,1], the variance of the imbalance is given by 

Ar(r 2 + 1) 

when number of trials is even, 

\r l — IV 

(4-2) 

8r 2 

—r. -7T + 1 when number of trials is odd. 

(H — ly 

As can be seen in the table, odd and even n form different patterns. This 
is due to the differences in the supports of the distributions; in particular, a 
significant mass is concentrated at when n is even and p is large. Note that 
both odd and even n form an increasing series for each p. This is expected 
and follows from Theorem 1 in Efron (1971) with h(j) = j 2 . It is also the 
case that Var(D n ) is a decreasing function of p for each n. This is also 
expected and was proved by Efron as Theorem 3 with h(j) = j 2 . It is clear 
that balancing properties stabilize for moderate-sized trials of around 75 
to 100. This contrasts to other randomization procedures such as complete 
randomization and the urn design [Wei (1978)] where the variance of D n 
grows at a rate 0(n) [Rosenberger and Lachin (2002), Chapter 3)]. 

Table 2 



Variance 


of the imbalance 


of the BCD(p) for 


different values of n 


and p 






V 






0.6 


0.7 


0.8 


0.9 






n even 






10 


5.19 


2.55 


1.18 


0.46 


20 


7.65 


2.91 


1.21 


0.46 


50 


10.78 


3.04 


1.21 


0.46 


100 


12.10 


3.04 


1.21 


0.46 


200 


12.45 


3.04 


1.21 


0.46 


oo 


12.48 


3.04 


1.21 


0.46 






n odd 






5 


3.30 


2.15 


1.45 


1.10 


15 


6.63 


2.95 


1.56 


1.10 


25 


8.52 


3.13 


1.57 


1.10 


75 


11.73 


3.20 


1.57 


1.10 


oo 


12.52 


3.21 


1.57 


1.11 
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Table 3 



Average excess selection bias of the BCD for different values of n 

and p 



P 



n 



0.6 



0.7 



0.8 



0.9 



5 
10 
15 
20 
25 
50 
75 
100 
200 



0.058 
0.070 
0.072 
0.075 
0.076 
0.080 
0.081 
0.081 
0.082 
0.083 



0.107 
0.129 
0.129 
0.136 
0.135 
0.140 
0.140 
0.141 
0.142 
0.143 



0.146 
0.178 
0.173 
0.183 
0.179 
0.186 
0.185 
0.187 
0.187 
0.188 



0.177 
0.217 
0.207 
0.220 
0.213 
0.221 
0.219 
0.222 
0.222 
0.222 



4.2. Selection bias. Theorem 2.1 allows us to calculate the selection bias 
for the BCD(p) using (1.2). When n is even, P{D n ^\ = 0) = 0, and therefore 
the selection bias is p. Obviously, the selection bias when n = 1 is 1/2. 
When n is an odd number exceeding 1, n = 2m + 1 and m 6 N, substituting 
the expression for P(D n _\ = 0) from Theorem 2.1, we obtain the following 
expression for the selection bias for this case: 



Now we can formulate a result on the total selection bias in n trials. 

Proposition 4.3. The total amount of selection bias inn, n> 1, trials 
for the BCD(p) is given by 



where [a] denotes the integer part of a and we use the adopted convention 
that the sum is treated as zero when the upper limit of summation is smaller 
than the lower limit. 

One subtracts n/2 from (4.4) to obtain the excess selection bias in n 
trials. The average excess selection bias in n trials (total excess selection 
bias divided by n) of the BCD(p) for different values of n and p is provided 



(4.3) 



>-('-i)> g^rri i ) 




q , 



in Table 3. 
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As expected, the excess selection bias increases with p. Also, note that the 
average excess selection bias is not a monotonic function of n. Asymptoti- 
cally, the excess is given in (1.3) and is reported in the table under n = oo. 
One can see that the asymptotic formula is a good approximation even for 
sample sizes as small as 50. 

4.3. Accidental bias. With the help of Corollary 2.1, which provides the 
exact form of the covariance matrix of the BCD(p), one can compute the 
accidental bias due to failure to adjust for any covariate z, given by z'Yl^ n z. 
However, the point of the accidental bias is to control the bias of the treat- 
ment effect caused by an unknown covariate. This leads to Efron's minimax 
solution of using the maximum eigenvalue of St„, given in inequality (1.4). 
The maximum eigenvalue of St„ therefore represents maximum suscepti- 
bility to accidental bias. At this time we are able to prove the following 
theorem. 

Theorem 4.1. One of the eigenvalues o/St„ of the BCD(p) is 2p, for 
alln>2 andp£ [1/2,1]. 

Proof. See Appendix C. □ 

Remark. The theorem affirms that the maximum eigenvalue of Sx„ 
exceeds 1 + (p— q) 2 . This shows that the maximum eigenvalue of the asymp- 
totic covariance structure studied by Efron (1971) and Steele (1980) is 
strictly less than the maximum eigenvalue of St„ . 

We conjecture, based on vast numeric evidence, that the maximum eigen- 
value of St„ of the BCD(p) does not depend on n and is equal to 2p for 
all n > 2 and p £ [1/2, 1]. Note that this leads to a maximum eigenvalue of 1 
for p = 0.5, which is the maximum eigenvalue for complete randomization, 
and 2 for p = 1, which is the maximum eigenvalue for the permuted block 
design with block size 2 [Rosenberger and Lachin (2002), Chapter 4]. 

4.4. Randomization tests. The final application of these results is to 
randomization-based inference procedures. Rosenberger and Lachin [(2002), 
Chapters 7, 11] discuss randomization tests in the context of linear rank 
statistics. Let Y n = (Yi, Yjj, . . . ,Y n ) be the responses based on some pri- 
mary outcome variable, and let y n be the realization. The responses, y n , 
are treated as fixed quantities, and under the randomization null hypothe- 
sis, y n is assumed to be unaffected by treatment assignments. The observed 
difference between Groups A and B then only depends on the manner the 
n patients were randomized. The general form of linear rank statistic is 
W n = a^T n where a n = (ai n , a2 n , • • • , a nn )' is a score function of the ranks 
of y n . The scores (ai n , a2n, • • • , a nn )' are usually centered by subtracting the 
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mean. Most standardly used test statistics in clinical trials have an analogous 
formulation as a linear rank test. 

Smythe and Wei (1983) and Hollander and Pena (1988) noted that, un- 
like for most other restricted randomization procedures, the test W n is not 
asymptotically normal for the biased coin design. Therefore the computation 
of the test requires either the exact distribution or a Monte Carlo approxima- 
tion. While our results do not give the exact distribution of the test statistic, 
we can compute its exact variance as Var (W n ) = a^5]T n a n using Corollary 
2.1. For example, using outcome data from a diabetes trial given in Table 7.4 
of Rosenberger and Lachin (2002), we generate a sequence of 50 treatment 
assignments from Efron's BCD(p = 2/3) and obtain W n = —31 with exact 
standard deviation 100.52. The latter computation required computing a 
50 x 50 matrix using Corollary 2.1. 

5. Conclusions. Despite the favorable properties depicted in Efron's orig- 
inal paper, the biased coin design is sparsely used in clinical trials. The ma- 
jority of clinical trials use a permuted block design which forces balance at 
regular intervals in the trial and achieves perfect balance unless there is an 
unfilled final block. However, in permuted blocks, some patients are assigned 
to treatment with probability 1 which can contribute to a vulnerability to 
selection bias, particularly in unmasked trials. We believe that Efron's pro- 
cedure should be used regularly in clinical trials where balance in treatments 
is desirable, both for its simplicity and for the reason that Efron suggested: 
it promotes balance with minimal susceptibility to experimental biases. We 
now have quantified the distribution of balance and the susceptibility to bi- 
ases in closed- form formulas for any p and n, and this should aid the clinical 
trialist in designing the trial appropriately. 

The selection of p has always been an interesting question. Efron used 
p = 2/3 in some of his examples. At one extreme, p = 1/2, we have com- 
plete randomization which has minimal selection and accidental biases, but 
maximum variability. At the other extreme, p = l, we have a deterministic 
sequence with maximum selection and accidental biases, but no variability. 
Formally, the selection should be a trade-off between the degree of random- 
ness desired (as reflected in selection bias), accidental bias (which is linear) 
and Var(D n ) which are competing objectives. Such multi-objective prob- 
lems can be solved through a compound optimality criterion with weights 
reflecting the relative importance of the criteria to the investigator. We now 
provide exact formulas for these criteria in (4.1) and (4.4). 

We note that these results may have applicability beyond clinical trials, 
as they form the basis of exact distribution theory for a general asymmet- 
ric random walk. While the theorems are proved for p > 0.5, they can be 
generalized for any p [Markaryan (2009)]. 
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APPENDIX A: PROOF OF THEOREM 2.1 

The following proposition follows immediately from the definition of the 
BCD(p) and is used without explicit mention in the proof of Theorem 2.1. 

Proposition A. 1. Let n = 1,2,3, k e Z andq = 1 — p. The following 
hold for the BCD{p): 

1. P(D n = k) > <^=^ \k\ < n and n and k have the same parity; 

2. P(D n = k)=P{D n = -k); 

3. P(D n+1 =0) = 2pP(D n = l); 

4. P(D n+1 = 1) = §P(D n = 0) + pP(£ n = 2); 

5. P(£> n +l = fe) = (1 — p)P(D„ = fc - 1) + pP(D„ = A: + 1), for 2 < < n; 

6. P(Z) rt+ i=n + l) = (l-p)P(A l = n). 

Next we formulate and prove two lemmas. 

Lemma A.l. Let n be a positive even integer, and let I be an integer 
satisfying < I < n/2. Then the following holds: 

( A1 ) n ~ 2l (^ + l \ n-2l + A + = n + 2-2l /g + l + A 
[ ' } n + 2iy i J n + 2l n + 2 + 2/^ l )' 

Proof. First, we make a substitution, u = n/2 in (A.l), to obtain an 
equivalent expression, 

, , u-l ( u + l\ u-l + 2f u + l\ u+l-l ( u + l + l 
y A - 2 ) 77T1 i + 



u + l \ I J u + l \1-1J u + l + l\ I 
Using easily checked identities, 

u + l\ u + l ( u + l + l 



I J u+l+l \ I 

and 

' u + l\ I ( u + l + l 



l-l J u + l + l V 1 
the left-hand side of (A. 2) can be re-written as 

(u-l)(u + l) + l(u-l + 2) ( u + \ + l 



(u + l)(u + l + l) V 1 

and the lemma follows from noting that 

(u-l){u + l) + l(u-l + 2) 1 , 
= u + 1 — I. 

(u + l) 



□ 
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Lemma A. 2. Let n be a positive integer, k be an integer satisfying 2 < 
k < n, I be an integer satisfying 1 < I < and n and k have opposite 

parities. Then the following holds: 




n + k + 2l-l 

(A.3) 

n + k _2l + l ( n + k + l +l \ 
n + k + 21 + 1^ 2 l y 

Proof. We first make a substitution, u = (n + k + l)/2 in (A.3), and 
obtain an equivalent expression, 

AS u — l + lfu + l — l\ u — l—lfu + l — l\ u — lfu + l 

( A - 4 ) —n A /_i + 



M+I-l \ l-l J U+l-l\ I J U+l\ I 

Using easily verified identities 

u + l — 1\ I fu + l\ . (u + l — l\ u fu + l 

and 



l-l J u+l \ I J \ 1 J u+l V 1 

and dividing both sides of (A. 4) by 

1 (u + l 



{u + l-l){u + l) V 1 
the result follows. □ 

Before we prove the theorem, note that in the light of Proposition A.l, 
the assumptions on n and k are for the purpose of identifying the nonzero 
probability events. Also, due to symmetry, we can restrict the proof to the 
case of nonnegative k. The proof is by induction and involves a series of 
straightforward calculations. The theorem is trivially true for the cases n = 1 
and n = 2. We assume the theorem is true for all positive integers up to and 
including n and prove that it is true for n + 1 . The proof is broken out into 
four cases: k = 0, k = 1, 2 <k <n and k = n+l. 

Proof of Theorem 2.1. 
Case k = 0. 




. Jn+H/2 "'y^' ' n-H-21 ( !i±i + A , 
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which is exactly (2.2) with n replaced by n + 1. 
Case k = l. We need to show that 



(A.5) 

Then 

P(D n+1 



P(D 



n+l 



1 



1) 



1=0 



n + 2 + 2l 



I 



P(D n = 0)+pP(D n = 2) 



1 "Z 2 " 1 o; / n , 



i=0 



n + 2<* 



+ p- -p 



(n-2)/2 



(n-2)/2 

E 

z=o 



n + 2 - 21 
n+T+2/ 



n 



+ 1 



,2+Z-l 



/ 



Now we shift the summation index in the second term, I := I + 1, and then 
collect the terms under a single summation, 

P(D n+1 = 1) 



n/2-1 



ra-2Z 
n + 2l 



n 



+ 1 



(A.6) 



H — V 
2 F 



1=0 

n/2 

n/2 j^ n + 2-2(l-l) 



n + 2 



+ 1-1 



^ra + 2 + 2(Z-l) 



-J> 



n/2 



'n/2-1 

E 

, z=i 



n-2l 
n + 2l 



n 



+ 1 



+ 



l-l 
n-2l 



n + 2l 



n < 
- + 1 
2 

l-l 



n/2 



Similar to the right-hand side of (A.5), the expression obtained in (A.6) is a 
product of p n / 2 /2 and a (n/2)th order polynomial in q. Therefore it remains 
to show that the polynomial inside the curly braces in (A.6) is the same as 
the polynomial in the right-hand side of (A.5). We will show term by term 
equality. First, the constant term in (A.6) is 1 which is the same as the 
constant term in (A.5). To show that the coefficients of q n / 2 are equal we 
need to show the following equality: 

2 

n 









(;-.) 




"-J 


n+l 
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We transform the left-hand side to obtain the right-hand side as follows: 

2 / n \ 1 n \ n \ 

- n 



n \2~ J n l 2 {n/2-l)\(n/2 + l)\ (n/2)!(n/2 + l)! 



1 (re+1)! 1 (n + 1 

re 



n+1 (n/2)!(n/2 + l)! n + 1 v 

To complete the proof for the case k = 1 it remains to show that the coeffi- 
cients of q l are equal for < I < re/2. This is contained in Lemma A.l. 
Case 2 < k < n. We need to show that 

P(D n+1 =k)= l p ^ + i)/2 

(A.7) 



( ""^ 1)/2 n + k-21 + 1 ( n + k + l + i 



,fc+/-i 



When k = re, n + 1 and k have opposite parities; therefore we can assume 
that 2 < k < n — 1 . We have 

P(D n+1 = k)= P P(D n = k + 1) + qP(D n = k-l) 

(n-fc-l)/2 + + i \ 

= „ . i D ("-*-l)/2 V n + k ~ 2l + 1 f = + / ] „fc+ 

2^ ^ n + A; + 2/ + l ^ 2 Z ^ 



( "-^ 1)/2 n + k-2l-l / n + k ~ 1 + j ' 



,fc+Z-2 



Now we shift the summation index in the first term, I := I + 1, and then 
collect the terms under a single summation to obtain 

P(D n+1 = k) 

(n-fc+l)/2 / n + fc + l X 
= 1 (n-fc+l)/2 _fc-l V- n + fc-2^ + 3 / +1-1 , 

"2 P Q ^ re + fc + 2/-!^ %_1 J 9 

(n-fc+l)/2 / n + fc_l v 



/ 



'(n-fc+l)/2 

E 



n + fc-2Z + 3 / n + fc + 1 +z-i' 
re + £; + 2/ - 1 I 2 / _ i 
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{(n-fc+l)/2 
£ 
1=1 



n + k -2l-l ( n + k 1 +l 



n+k + 2l-l 



2 

/ 



where c = _p( n_fc+1 " 2 (/ fe_1 /2. Comparing (A. 8) with (A. 7) we immediately 
see that the terms corresponding to I = are equal to c. To complete the 
proof for the case 2 < k < n all that remains is an application of Lemma 
A.2. 

Case k = n + 1: This follows immediately from the fact that 

P(D n+l =n+l) = \q n . 
The theorem is proved. □ 

APPENDIX B: PROOF OF THEOREM 2.2 

The following proposition follows immediately from the Markovian prop- 
erty and time homogeneity of the BCD(p) process. 

PROPOSITION B.l. Let n = 0, 1,2,3, ... , m = 1,2,3, .. . and m>n. De- 
fine <r(T n ) to be the sigma- algebra generated by T\, . . . ,T n . The following 
hold for the BCD{p): 

1. P(T m = ±l\D n ,a(T n )) = P(T m = ±l\D n ); 

2. P(T m = ±l\D n = k) = P(T m+l = ±l\D n+l = k), for any I > 0. 

Next we state and prove three lemmas that are used in the proof of 
Theorem 2.2. 



Lemma B.l. Let 1 < n < m. Then the following holds for the BCD(p), 
p€ [0,1]: 

n-1 

(B.l) P(T n = l,T m = l)= P(T m = l\D n = k + l)d n - ljk t k . 

k=-n+l 

Proof. Before providing the proof, note that because Theorem 2.1 gives 
the form of d nk , the lemma reduces the finding of P(T n = 1, T m = 1) to find- 
ing conditional probabilities of the form P(T m = l\D n = k). By conditioning 
on D n _\ we obtain 

n-l 

P(T n = l,T m = l)= Y, P(T n = l,T m = l\D n ^ = k) 

k=- n +l 

(B.2) 

xP(D n _i = fc). 
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Note that (B.2) holds for n = 1 as well because we defined P(Dq = 0) = 1. 
Also, instead of requiring n — k be odd so that P{D n ^\ = k) > 0, we follow 
the adopted convention that the probabilities of events conditional on zero- 
probability event are treated as 0. 

Now we make use of an easily verified identity, 

P(A n B\C) = P(A\B n C) ■ P{B\C), 2 

to transform the conditional probabilities in the right-hand side of (B.2), 

P(T n = l,T m = l\D n ^ = k) 

(B.3) 

= P{T m = l\T n = 1, D n -i = k)P{T n = l|£> n _! = k). 
Now we use the fact that the following two events are equal: 

{Z) n _i = k and T n = 1} and {D n = k + 1 and T n = 1} 
and that T m is conditionally independent of T n given D n to write 
P(T m = l\T n = l,D n ^ = k)= P(T m = l\T n = l,D n = k + 1) 

= P(T m = l\D n = k + l). 
Substituting this last expression into the right-hand side of (B.3) we obtain 
P(T n = l,T m = l|Z> n _i =k) = P(T m = l\D n = k + l)P(T n = l\D n ^ = k). 
The result follows from substitution into (B.2). □ 

The next lemma is devoted to finding the first visit probabilities of the 
imbalance process into the state. We define Tj to be the number of steps 
the imbalance process makes to visit state for the first time from the ith 
state. 

Lemma B.2. For the imbalance process of the BCD(p), p£ [0,1], the 
probabilities of the first visits from state k, k = ±1, ±2, . . . , into state in 
exactly I steps, I > \k\, is given by the following formula: 

(B.4) /» =P(r k = l) = ^(l + \k\) p HMH)/2 q ii-\k\)/2^ 

where, according to the adopted convention, the combination is to be treated 
as when (I + |/e|)/2 is not an integer. 



2 This identity still holds when either B or C have zero probability when used with the 
adopted convention. 
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Proof. First, due the symmetry, fu = f_L , for any k E N. Therefore 
without loss of generality, we can assume that k is positive. Thus we are 
concerned with finding first visit probabilities from state k, k G N into state 
in exactly l>k, steps. 

We can treat this problem as a random walk on the nonnegative integers 
with an absorbing barrier at and use well-known results in the classical 
gambler's ruin problem where the gambler plays with infinitely reach adver- 
sary and at each step wins one unit with probability q and loses one unit 
with probability p. The question is equivalently formulated as: what is the 
probability that a gambler with initial capital of A;, k £ N, is ruined in ex- 
actly I, l> k, steps? These probabilities are well known and can be found in 
(4.14) of Feller (1968). One needs to reverse the roles of p and q and replace 
z with k and n with I. □ 

Lemma B.2 provides all the nontrivial probabilities for f9^ Q . To complete 

the remaining cases, we note that /^j] = when k ^ 0, and /q°q = 1. 

The next lemma provides probabilities for the imbalance process to ulti- 
mately reach the state from any other state. 

Lemma B.3. For the imbalance process of the BCD(p), pG [0.5,1], the 
probability of ultimately reaching state from state k, k = ±1, ±2, . . . , is 1. 

Proof. The proof of the lemma is similar to that of Lemma B.2. Again, 
without loss of generality, it can be assumed that k is positive. The problem 
is equivalent to computing the probability of ultimate ruin in the classical 
gambler's ruin problem when the gambler, having an initial capital k, plays 
with infinitely reach adversary and at each step wins one unit with prob- 
ability q and loses one unit with probability p. These probabilities can be 
found in (2.18) of Feller (1968). One needs to reverse the roles of p and q 
and replace z with k. □ 

Note that Lemma B.3 implies that f2 Q is a probability mass function 
when p £ [0.5, 1]. 

Before starting the proof of Theorem 2.2, note that only about half of the 
summands in the right-hand side of (2.3) will be nonzero because d n ^ = 
whenever n — k is not even. 

Proof of Theorem 2.2. The essence of the proof is in evaluating 
conditional probabilities of the form P(T m = l\D n = k). We will show that 
for 1 < n < m, \k\ <n and n — k even, the following holds: 

(B.5) P(T m = l\D n = &) = (-- t k )fj™~ n - 1] + t fc . 
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This equation is of independent interest as it provides the form of proba- 
bilities of treatment assignments conditional on a past value of imbalance. 
Note that when k = 0, (B.5) simply states that P(T m = l\D n = 0) = 1/2, as 
expected. The case when m = n + l is the definition of the BCD(p). 

To prove (B.5), we use a conditioning argument and condition on the first 
visit events into the state, as follows: 

P(T m = l\D n = k) 

m—n—l 

= ]T P(T m = l\D n = k,T k = l)P{T k = l\D n = k) 

(B.6) 

+ P(T m = l\D n = k,T k (£ [0,m-n-l]) 
xP(T k <£[0,m-n-l}\D n =k). 

We first evaluate P(T m = l\D n = k,r k = I) for the case 0<l<m — n — 1. 
We only need to look at the cases when (I — \k\)/2 is a nonnegative integer 
because in all other cases P(r k = l\D n = k) = 0: 

P(T m = l\D n = k,T k = l) 

= P(T m = l\D n = k, D n+l £ 0, D n+2 + 0, ... , D n+l _x ^ 0, D n+l = 0) 
= P(T m = l\D n+l = 0) = P(T m _ re _i = 1) = 1/2. 

The first equality, in the chain of equalities above, is a consequence of the 
following equality of events: 

{D n = k,r k = l} = {D n = k, D n+1 £ 0, D n+2 0, ... , D n+ i_i / 0, D n+l = 0}. 

The second equality is just the Markovian property of the imbalance process 
[see Proposition B.l(l)]. The third equality follows from time-homogeneity 
property formulated in Proposition B.l(2). Thus we have proved that when 
1 < n < m, \k\ < n, n — k is even, < I < m — n — 1 and (I — \k\)/2 is a 
nonnegative integer, then 

(B.7) p(T m = l\D n = k,r k = I) = 1/2. 

Now we turn to the case when r k ^ [0,m — n — 1]. As before, we have 1 < 
n <m, \k\ <n and n — k is even. We look at three sub-cases. 
Case k > 0. 

P(T m = l\D n = k,T k >m-n) 

= P(T m = l\D n = k, An-i > 0, T k jL [0,m - n - 1]) 
= P(T m = l\D m - 1 >0) = q. 
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The first equality above follows from equality of {D n = k,r k > m — n)} and 
{D n = k,D m _i > 0, Tfc ^ [0, m — n — 1]}. The second equality follows from 
Proposition B.l(l). 
Case k < 0. 

P(T m = l\D n = k,T k >m- n) 

= P(T m = l\D n = k, An-i < 0, T k i [0, m - n - 1]) 

= P(T m = l| J D m _ 1 <0)=p. 

The first equality above follows from equality of {D n = k,r k > m — n)} and 
{D n = k,D rn _\ < 0, Tfc ^ [0,m — n — 1]}. The second equality follows from 
Proposition B.l(l). 
Case k = 0. 

P(T m = l\D n = 0,T O <£[Q,m-n-l])=0, 

because of impossibility of the event {to > 1}. 

Substituting (B.7) and the expressions obtained in the above three cases 
into (B.6), we obtain 

m— n— 1 ^ 

(B.8) P(T m = l\D n = k)= ^fi l l + tkP(r k ^[0,m-n-l]\D n = k). 

1=0 

According to Lemma B.3, when p G [1/2, 1], we have 

(B.9) P(r k <£ [0,m - n - l]\D n = k) = 1 - P{r k e [0,m - n- l]\D n = k). 

Substituting (B.9) into (B.8), we obtain 

m— n— 1 1 

P(T m = l\D n = k)= £ ^S + ^a-fi"^) 

_ 1 J(m-n-l) , ,~ ?(m-n-l)x 
— 2^,0 ^ •' fc .° ' 

= Q-^)/S" n " 1) +^ 

Thus (B.5) is proved. To complete the proof of the theorem, it remains to 
use Lemma B.l and substitute (B.5) with k := k + 1 into (B.l). □ 

APPENDIX C: PROOF OF THEOREM 4.1 

We will show that 2p is an eigenvalue of £t„ with the corresponding 
normalized eigenvector a n = (a™, a%, a%,..., <)' = (y/2/2, -y/2/2, 0, . . . , 0)'. 
The proof proceeds by induction. The theorem is trivially true for the case 
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n = 2. For the case n = 2, one can actually show that 2p is the maximum 
eigenvalue. 3 The two eigenvalues of St 2 are %P an d 2 — 2p and 2p>2 — 2p 
when p > 1/2. 

Proof. We assume the theorem is true for all positive integers n > 2, 
and prove that it is true for n + 1. We partition 5]t„ + i as follows: 





b 


b' 


1 



where b = (cri in+1 ,cr 2 , n+ i, . . -,(T n , n+ i)' with try = Cov(Tj, 7)). Denote 

x = (x 1 ,0) / , 
where xi is the n-dimensional vector, 

x 1 = f\/2/2,-V2 

We need to show that 

r xi 



(C.l) 



=w 


2/ 


J- n 


b 


b' 


1 



,0)'. 







2p 



xi 







By the induction assumption, we have that Sx n xi = 2pxi. To prove (C.l), it 
remains to show that b'xi = 0. This is equivalent to v2/2(cri >n+ i — o"2, n +i) = 
which in turn is equivalent to (see Corollary 2.1) 



(C.2) 



P(Ti = l,T n+1 = 1) = P(T 2 = l,T n+1 = 1). 



From Theorem 2.2 we have the forms of P(T\ = l,T n+ i = 1) and P(T 2 

l,Pn+l = 1), 

P(T! = l,T n+1 = 1) = ((1 - g)/^ + g) i = i (| - q)f[^ + ±g, 
P(T 2 = l,T n+1 = 1) = ((I - i)/^ 2) + I)ip+ ((I - g)/ 2 (n ~ 2) +q)\q 

= \p+U\-<i)&- 2) + W- 

Thus, in order to show (C.2), we need to show that 

(c.3) Uh - tiffr 1 * + h = b + H\ - «)fTo- 2) + b 2 - 

Using an easily verified identity, 



3 The same can be shown for n — 3 and n — 4 by solving for the zeroes of the charac- 
teristic polynomials. 
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and substituting it into (C.3), we obtain 
(C.4) 

The term \q{\ — q)f2o ^ appears in both sides of (C.4); subtracting it from 
both sides we require 

- 2 p(- 2 -q) + h = \p+b 2 - 

This last equality is trivially checked. □ 
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